Method and device to distribute code and data stores between volatile memory and non-volatile memory

ABSTRACT

A method, device, and system to distribute code and data stores between volatile and non-volatile memory are described. In one embodiment, the method includes storing one or more static code segments of a software application in a phase change memory with switch (PCMS) device, storing one or more static data segments of the software application in the PCMS device, and storing one or more volatile data segments of the software application in a volatile memory device. The method then allocates an address mapping table with at least a first address pointer to point to each of the one or more static code segments, at least a second address pointer to point to each of the one or more static data segments, and at least a third address pointer to point to each of the one or more volatile data segments.

FIELD OF THE INVENTION

The invention relates to allocating a combination of volatile memory andnon-volatile memory for storage, of code and data sections of a softwareapplication.

DESCRIPTION OF THE RELATED ART

Memory performance and capacity requirements continue to increase acrossmany aspects of the computing industry. In addition, memory powerrequirements and memory cost have become a significant component of theoverall power and cost, respectively, of a given computing systemranging from a smart phone to a server. Memory and storage subsystemscan increase or decrease the overall performance of a computing devicedepending on implementation specifics. Because it is generally desirableto have faster performing computing devices that utilize less power andcost less, a wide variety of designs of the memory and storagesubsystems exist that attempt to maximize end user perceived performancewhile minimizing cost and power consumption.

Current operating systems provide an application programming interface(API) to allow for applications and drivers to request memory from pagedand non-paged pools. Non-paged pools are typically used for data thatmust not be paged to a mass storage drive (e.g., pages that hardware isexpected to access during execution of the application). Since currentlyall allocated pages are accessed when in DRAM so there is no additionalinformation required at page allocation time whether the data that isgoing to be stored within an allocated buffer is expected to be for justreading versus read-and-write.

BRIEF DESCRIPTION OF THE DRAWINGS

The following description and accompanying drawings are used toillustrate embodiments of the invention. In the drawings:

FIG. 1 illustrates an embodiment of a memory arrangement that allows acomputer system to utilize a hybrid memory structure for code and datastorage for processors as well as for embedded systems.

FIG. 2 describes an embodiment of volatile and non-volatile memoryspaces when utilizing a software application hybrid code and datastorage system.

FIG. 3 is a flow diagram of an embodiment of a process to allocatevolatile and non-volatile memory for code and data sections of asoftware application.

FIG. 4 illustrates an embodiment of a computer system that includes anembedded controller which has direct access to a PCMS device internal toan I/O subsystem.

FIG. 5 describes an embodiment of volatile and non-volatile memoryspaces in an I/O subsystem when utilizing a hybrid firmware code anddata storage system.

DETAILED DESCRIPTION

Many embodiments described below resolve code and data usage challengesintroduced to the memory and storage subsystems of a computing device bysubdividing the performance requirement and the capacity requirementbetween diverse memory technologies. The focus of this approach is onproviding performance with a relatively small amount of a higher-speedmemory such as dynamic random access memory (DRAM) while implementingthe bulk of the system memory using a significantly cheaper and densernon-volatile memory. Several embodiments described below define platformconfigurations that enable hierarchical memory subsystem organizationsfor the use of a certain type of non-volatile memory, specificallyreferred to as non-volatile random access memory (NVRAM) to augmentvolatile memory, by one or more graphics processors in a computingdevice. The use of the NVRAM in the memory hierarchy additionally allowsnon-volatile memory mass storage implementations as a substitute forstandard mass storage drives.

FIG. 1 illustrates an embodiment of a memory arrangement that allows acomputer system to utilize a hybrid memory structure for code and datastorage for processors as well as for embedded systems.

A central processing unit (CPU), a graphics processing unit (CPU), andan embedded processor may be provided access to both volatile andnon-volatile forms of memory according to several embodiments. The CPUand CPU can also be referred to as “processors” throughout thisdocument. These different nomenclatures are utilized.

FIG. 1 shows a number of logic units which may or may not be located onthe same or different semiconductor dies or the same or differentsemiconductor package. Logic units in FIG. 1 include a CPU that has oneor more cores and a cache, a GPU that also has one or more cores and acache, a memory subsystem, and an 110 subsystem. These units areseparated by dotted lines to show the possibility that each of theselogic blocks may or may not be located in the same semiconductor dieand/or package.

Turning now to the detailed elements of FIG. 1, a CPU 100 is present.This CPU includes one or more cores 102. Although not shown, each coremay internally include one or more instruction/data caches, executionunits, prefetch buffers, instruction queues, branch address calculationunits, instruction decoders, floating point units, retirement units,etc. In other embodiments that are not shown, the system may includemultiple CPUs, each with its own set of logic units that are displayedin FIG. 1.

The CPU 100 includes at least one lower level cache, such as cache 104.This may be a general purpose cache that is capable of storing asignificant amount of data retrieved from memory locations in volatilememory 106 and/or an NVRAM 108. In different embodiments, cache 104 maybe shared among all cores or each core may have its own lower levelcache.

CPU 100 may also include additional logic that is not shown whichcoordinates and operates at least core(s) 102. In some embodiments, thisadditional logic may be referred to as a home agent. The home agent mayinclude, for example, a power control unit (PCU). The PCU may includelogic and components needed for regulating the power state of thecore(s) 102 among other tasks.

According to several embodiments, the computer system FIG. 1additionally includes a GPU 110. The GPU 110 also may include one ormore core(s) 112. Each core may include one or more execution units andone or more instruction and data caches utilized to feed the executionunits with information to process. Additionally the GPU 110 may containother graphics logic units that are not shown in FIG. 1, such as one ormore vertex processing units, rasterization units, media processingunits, and codecs among others. For sake of simplicity, the specificlogic within the core(s) 112 as well as other graphics-related logicunits within the GPU 110 are not shown.

There may be one or more lower level caches in GPU 110 as well, such ascache 114. This cache may be utilized as a general purpose cache or acache specific to one or more particular types of graphics data (e.g.,vertex data). Other lower level caches are not shown, though in someembodiments multiple caches like cache 114 exist within GPU 110.

According to many embodiments, a display controller 116 iscommunicatively coupled to the GPU 110. The display controller 124receives information to be displayed upon a display device a monitor, atelevision, a projector, etc.). In many embodiments, the displaycontroller 116 specifically receives frame buffers. Each frame bufferconsists of an image comprising pixels that is then interpreted by thedisplay controller and the image is fed to the display device forviewing. Depending on the refresh frequency of the display device, framebuffers may be fed to the display controller 116 a certain number oftimes per second. For example, a 60 Hz refresh rate utilizes 60 images(frame buffers of image information) per second. Different displaydevices stay utilize higher frequency refresh rates and simply re-samplethe same frame buffer two or more times prior to utilizing a new framebuffer of information to display.

A memory subsystem 118 is also present in FIG. 1. There is a volatilememory controller 120, which may be utilized to provide access tovolatile memory 106. Volatile memory controller 120, which is integratedinto the CPU package or discrete from the CPU package in differentembodiments, may receive a memory access request from a CPU core 102 ora GPU core 112 and route that request to volatile memory 106. Likewise,non-volatile (NV) memory controller 122 may receive a memory accessrequest from a CPU core 102 or a GPU core 112 and route that request toNVRAM 108. In some embodiments, the volatile memory controller 120 andnon-volatile memory controller 122 are integrated into one large memorycontroller. In other embodiments they are separate controllers.

In many embodiments, an input/output (I/O) subsystem 124 is present inthe system in FIG. 1 to communicate with I/O devices, such as I/Odevice(s) 126. Within the I/O subsystem 124, one or more I/O adapter(s)128 are present to translate a host communication protocol utilizedwithin the CPU 100 to a protocol compatible with particular I/O devices.Some of the protocols that adapters may be utilized for translationinclude Peripheral Component Interconnect (PCI)-Express (PCI-E), 3.0;Universal Serial Bus (USB), 3.0; Serial Advanced Technology Attachment(SATA). 3.0; Small Computer System Interface (SCSI), Ultra-640; andInstitute of Electrical and Electronics Engineers (IEEE) 1394“Firewire;” among others.

Additionally, there may be one or more wireless protocol I/O adapters.Examples of wireless protocols, among others, are used in personal areanetworks, such as IEEE 802.15 and Bluetooth, 4.0; wireless local areanetworks, such as IEEE 802.11-based wireless protocols; and cellularprotocols.

A Basic Input/Output System (BIOS) flash 130 device may additionally bepresent in the system to provide a set of boot instructions when thesystem powers on or reboots. For BIOS flash 130 device, some of theprotocols that 110 adapters 128 may translate include Serial PeripheralInterface (SPI) and Microwire among others.

Returning to the NVRAM 108, an overview of the NVRAM is provided below.

1. Non-Volatile Random Access Memory Overview

There are many possible technology choices for NVRAM, including PCM,Phase Change Memory and Switch (PCMS) (the latter being a more specificimplementation of the former), byte-addressable persistent memory(BPRAM), storage class memory (SCM), universal memory, Ge2Sb2Te5,programmable metallization cell (PMC), resistive memory (RRAM), RESET(amorphous) cell, SET (crystalline) cell, PCME, Ovshinsky memory,ferroelectric memory (also known as polymer memory andpoly(N-vinylcarbazole)), ferromagnetic memory (also known asSpintronics, SPRAM (spin-transfer torque RAM)), STRAM (spin tunnelingRAM), magnetoresistive memory, magnetic memory, magnetic random accessmemory (MRAM), and Semiconductor-oxide-nitride-oxide-semiconductor(SONOS, also known as dielectric memory).

NVRAM has the following characteristics:

It maintains its content even if power is removed, similar to FLASHmemory used in solid state disks (SSD), and different from SRAM and DRAMwhich are volatile;

it has lower overall power consumption than volatile memories such asSRAM and DRAM;

it has random access similar to SRAM and DRAM (also known as randomlyaddressable);

it is rewritable and erasable at a lower level of granularity (e.g.,byte level) than FLASH found in SSDs (which can only be rewritten anderased a “block” at a time—minimally 64 Kbyte in size for NOR FLASH and16 Kbyte for NAND FLASH);

it is used as a system memory and allocated all or a portion of thesystem memory address space;

it is capable of being coupled to the CPU over a bus (alsointerchangeably referred to as an interconnect or link) using atransactional protocol (a protocol that supports transaction identifiers(IDs) to distinguish different transactions so that those transactionscan complete out-of-order) and allowing access at a level of granularitysmall enough to support operation of the NVRAM as system memory (e.g.,cache line size such as 64 or 128 byte). For example, thebus/interconnect may be a memory bus (e.g., a double data rate (DDR) bussuch as DDR3, DDR4, etc.) over which is run a transactional protocol asopposed to the non-transactional protocol that is normally used. Asanother example, the bus may one over which is normally run atransactional protocol (a native transactional protocol), such as a PCIexpress (PCIS) bus, desktop management interface (DMI) bus, or any othertype of bus utilizing a transactional protocol and a small enoughtransaction payload size (e.g., cache line size such as 64 or 128 byte);and

it also has one or more of the following characteristics:

it has faster write speed than non-volatile memory/storage technologiessuch as FLASH;

it has very high read speeds (faster than FLASH and near or equivalentto DRAM read speeds);

it is directly writable (rather than requiring erasing (overwriting with1 s) before writing data like FLASH memory used in SSDs); and/or

it allows a greater number of writes before failure (more than boot ROMand FLASH used in SSDs).

As mentioned above, in contrast to FLASH memory, which must be rewrittenand erased a complete “block” at a time, the level of granularity atwhich NVRAM is accessed in any given implementation may depend on theparticular memory controller and the particular memory bus or other typeof bus to which the NVRAM is coupled. For example, in someimplementations where NVRAM is used as system memory, the NVRAM may beaccessed at the granularity of a cache line (e.g., a 64-byte or 128-Bytecache line), notwithstanding an inherent ability to be accessed at thegranularity of a byte, because cache line is the level at which thememory subsystem accesses memory. Thus, when NVRAM is deployed within amemory subsystem, it may he accessed at the same level of granularity asDRAM used in the same memory subsystem. Even so, the level ofgranularity of access to the NVRAM by the memory controller and memorybus or other type of bus is smaller than that of the block size used byFlash and the access size of the I/O subsystem's controller and bus.

NVRAM may also incorporate wear leveling algorithms to account for thefact that the storage cells begin to wear out after a number of writeaccesses, especially where a significant number of writes may occur suchas in a system memory implementation. Since high cycle count blocks aremost likely to wear out in this manner, wear leveling spreads writesacross the far memory cells by swapping addresses of high cycle countblocks with low cycle count blocks. Note that most address swapping istypically transparent to application programs because it is handled byhardware, lower-level software (e.g., a low level driver or operatingsystem), or a combination of the two.

The NVRAM is distinguishable from other instruction and datamemory/storage technologies in terms of its characteristics and/or itsapplication in the memory/storage hierarchy. For example, NVRAM isdifferent from:

static random access memory (SRAM) which may be used for level 0 andlevel 1 internal processor caches dedicated to each core within aprocessor and lower level cache (LLC) shared by cores within aprocessor;

dynamic random access memory (DRAM) configured as a cache internal to aprocessor die (e.g., on the same die as the processor), configured asone or more caches external to a processor die (e.g., in the same or adifferent package than the processor die), or general system memoryexternal to the processor package; and

FLASH memory/magnetic disk/optical disc applied as mass storage; and

memory such as FLASH memory or other read only memory (ROM) applied asfirmware memory (which can refer to boot ROM, BIOS Flash, and/or TPMFlash).

In FIG. 1, NVRAM 108 may be used as instruction and data storage that isdirectly addressable by a CPU 100 and is able to sufficiently keep pacewith the CPU 100 in contrast to FLASH/magnetic disk/optical disc appliedas mass storage. Direct addressability refers to a processor, such as aCPU or GPU, being able to send memory requests to the NVRAM as if itwere standard DRAM (e.g., through standard memory store and loadcommands). Moreover, as discussed above and described in detail below,NVRAM 108 may be placed on a memory bus and may communicate directlywith a memory controller that, in turn, communicates directly with theprocessor 100.

NVRAM 108 may be combined with other instruction and data storagetechnologies (e.g., DRAM) to form hybrid memories (also known asCo-locating PCM and DRAM; first level memory and second level memory;FLAM (FLASH and DRAM)). Note that at least some of the abovetechnologies, including PCM/PCMS may be used for mass storage insteadof, or in addition to, system memory, and need not be random accessible,byte addressable or directly addressable by the processor when appliedin this manner.

For convenience of explanation, most of the remainder of the applicationwill refer to “NVRAM” or, more specifically, “PCM,” or “PCMS” as thetechnology selection for the non-volatile memory. As such, the termsNVRAM, PCM, and PCMS may be used interchangeably in the followingdiscussion. However it should be realized, as discussed above, thatdifferent technologies may also be utilized.

2. Volatile Memory Overview

“Volatile memory” 106 is an intermediate level of memory configured inconjunction with NVRAM 108 that has lower read/write access latencyrelative to NVRAM 108 and/or more symmetric read/write access latency(i.e., having read times which are roughly equivalent to write times).In some embodiments, the volatile memory 106 has significantly lowerwrite latency than the NVRAM 108 but similar (e.g., slightly lower orequal) read latency; for instance the volatile memory 106 may be avolatile memory such as volatile random access memory (VRAM) and maycomprise a DRAM or other high speed capacitor-based memory. Note,however, that the underlying principles of the invention are not limitedto these specific memory types. Additionally, the volatile memory 106may have a relatively lower density and/or may be more expensive tomanufacture than the NVRAM 108.

In some embodiments, volatile memory 106 is configured between the NVRAM108 and the internal processor caches. In some of the embodimentsdescribed below, volatile memory 106 is utilized to mask the performanceand/or usage limitations of the NVRAM 108 including, for example,read/write latency limitations and memory degradation limitations. Inthese implementations, the combination of volatile memory 106 and NVRAM108 operates at a performance level which approximates, is equivalent orexceeds a system which uses only DRAM as system memory.

In different embodiments, volatile memory 106 can be located on theprocessor die, located external to the processor die on a separate dielocated on the CPU package, located outside the CPU package with a highbandwidth link to the CPU package, for example, on a memory dual in-linememory module (DIMM), a riser/mezzanine, or a computer motherboard). InFIG. 1, volatile memory 106 is shown being located external to the CPUpackage. The volatile memory 106 may be communicatively coupled with theCPU 100 using a single or multiple high bandwidth links, such as DDR orother transactional high bandwidth links. A communicative coupling ofdevices refers to being coupled through an electrical, optical,wireless, or other form of link or combination of links to allowinformation to be passed back and forth between the devices that arecoupled to one another. In some embodiments, the coupling is direct andallows information to pass directly from the first device to the secondand, potentially, vice versa. In other embodiments, the coupling isindirect and requires the information to pass through one or moreadditional devices that reside along the route the information takeswhile being transferred between the two communicatively coupled devicesin question.

3. Hybrid Memory Code and Data Storage for Processors

According to many embodiments, a hybrid memory solution in a computersystem combining DRAM and PCMS storage or another type of NVRAM may beutilized to access both the code and data sections of a softwareapplication being executed is disclosed. As mentioned above, althoughother forms of NVRAM may be applicable to these solutions, the technicalspecifications of PCMS make it a quality candidate technology fornon-volatile memory/storage. Thus, the examples shown will make use ofPCMS memory, although in other embodiments, another form of NVRAM may beutilized.

In a computer system that includes an amount of PCMS memory/storage.Software applications and an operating system running on the computersystem may both take advantage of a PCMS device's fast read capability.The fast reads may allow software application and operating system toexecute code directly from PCMS on the processor.

In many embodiments, the operating system running on the computer systemmay be aware of the PCMS storage and will execute the operating systembinaries directly out of physical PCMS address space. This execution isdifferent from other forms of non-volatile memory, such as mass storagesolutions, which instead require loading code (binaries) into a volatilememory, such as DRAM, prior to executing the code.

The operating system will build an interface for the softwareapplication and any driver software running on the operating system.According to many embodiments, the application and/or the driver willsend requests to the operating system for buffers to store binaryexecutable code that will not be modified during execution. Theoperating system then grants these buffers and provides pointers to themin the software application's and/or driver's allocated memory addressspace. Because the binaries in these buffers are not modified, there isgenerally an explicit indication/designation originated by the software,driver, and/or the operating system to make the buffers unmodifiable.

The operating system running on the system may treat the PCMS storage asa physical DRAM and map portions of the files that make up the operatingsystem and software application into a memory management unit (MMU).When in the MMU, these file locations can be translated and storedsimilar to translation lookaside buffer (TLB) entries and addressed asif they were standard DRAM memory locations. From an operating system'slogical address space to a physical PCMS device address location theremay be multiple page walks to first get from a logical address that theoperating system utilizes, to a platform physical address (PPA) that theunderlying general memory management hardware utilizes for main DRAMmemory, and finally to a direct physical PCMS device address. There maybe one or more address mapping tables 132 (also known as addressindirection tables) to accomplish this. In many embodiments, the addressmapping tables 132 are stored in DRAM (volatile memory 106).

Apart from binary code storage in PCMS, there are other portions ofsoftware application, driver, and operating system information that maybe stored adequately in physical PCMS space. Specifically, certain data(as opposed to code) sections of the software also may be storable inPCMS when that data is static or nearly static. Thus, in manyembodiments there is at least a bit designation per data element thatsignifies whether the particular data element to be stored is or is notstatic. The static designated data elements may be stored in static(i.e., unmodifiable) buffers, whereas the non-static designated dataelements may be stored in volatile (i.e., modifiable) buffers. Thisparticular designation per buffer allows the OS to:

-   -   Determine when to allocate a buffer in actual DRAM vs. PCMS        storage, wherein the buffer is allocated in DRAM if the buffer        is designated as volatile and allocated in PCMS when the buffer        is designated as static.    -   When allocating a buffer in DRAM, the operating system may        simply follow a legacy page allocation procedure.    -   When allocating a buffer in PCMS (i.e., a buffer not requiring        modification), the operating system may map physical PCMS device        address space into its page tables to allow direct memory        accesses to the PCMS device.    -   Optionally, the operating system may enforce the read only        characteristics of a buffer by setting only read permissions for        the PCMS buffer.

FIG. 2 describes an embodiment of volatile and non-volatile memoryspaces when utilizing a software application hybrid code and datastorage system.

In FIG. 2 a DRAM memory space 200 (e.g., 4 GB of space) and an NVRAMmemory space 202 (e.g., 64 GB of space) are present. There may beadditional memory spaces, including a logical memory space in which theoperating system controls, but those are not shown for sake of clarity.An operating system 204 is resident in DRAM memory space 200 and isbeing executed on the CPU 100. In many embodiments, driver software 206is also resident in DRAM memory space 200 and is executing on the CPU100 in conjunction with the operating system 204. At some point duringoperation, the operating system 204 loads a software application 208into DRAM memory space 200. The load operation includes the operatingsystem 204 reserving memory space for the software application 208. Thereserved memory space includes space that is utilized to store thesoftware application's binary executable code (“code”) as well as spacethat is utilized to store any modifiable data (“data”) the softwareapplication 208 uses to run.

As mentioned above, according to many embodiments, the code is generallystatic, as the binary files do not change during execution. On the otherhand, data may change or it may not change, depending on a type of data.For example, static data (e.g., constants) will not change but volatiledata (e.g., a stored variable that is recalculated based on continuallychanging input) does change. Thus, code and static data can be placed ina read-only section of memory, whereas volatile data would generally beplaced in a read/write section of memory. In many embodiments, thesoftware application 208 may be given the opportunity to preset portionsof data and code with a storage-type bit. For example, if a structurethat is defined by the software application as volatile (meaning data inthe structure is modifiable), the software application may expresslyrelay that by setting a “volatile” bit for that structure. Then when theoperating system 204 is allocating memory space to store the structure,it will see the “volatile” bit as set and know to allocate the structurein the volatile data section of DRAM memory space. Alternatively, if adata variable is declared as a constant value, the software applicationmay clear the “volatile” bit to tell the operating system to allocatethe data as static.

In a PCMS-based system, it is entirely plausible to store the code andstatic data in PCMS (i.e., NVRAM) memory space while still storingvolatile data in DRAM memory. Thus, according to many embodiments, theoperating system sets up an address mapping table 132 that staysresident in DRAM memory space 200. The operating system, while loadingthe software application 208, specifically stores the code (e.g., code 1(210) and code 2 (212)) and static data 214 into NVRAM memory space 202while storing the volatile data 216 in DRAM memory space 200. Onceloaded, the operating system creates a group of code pointers 218 in theaddress mapping table 132 to point to each block of code stored in theNVRAM memory space 202. The operating system 204 also creates a group ofdata pointers 220 in the address mapping table 132 to point to eachblock of data stored in the NVRAM memory space 202 as well as each blockof data stored in the DRAM memory space 200.

FIG. 3 is a flow diagram of an embodiment of a process to allocatevolatile and non-volatile memory for code and data sections of asoftware application. The process is performed by processing logic whichmay comprise hardware, software, firmware, or a combination of two ormore of these listed forms of processing logic.

The process begins with processing logic receiving a request to allocatememory for a portion of a software application (processing block 300).The request may come in the form of an automated or user-initiateddecision to execute the software application. Once it is requested tolaunch and execute the software application, processing logic receivescode (such as a binary executable file) and data (such as a data file)elements of the software application to load into one or more forms ofmemory to have access during execution of the software application.

For a given portion of the software application to be allocated,processing logic then determines whether that portion comprises code ordata (processing block 302). If the portion comprises code, thenprocessing logic allocates a segment of NVRAM memory space for storageof the code (processing block 304). If the portion comprises data, thenprocessing logic next determines whether the data is volatile(processing block 306).

In many embodiments, there is a “volatile” bit that may be set orcleared for any given portion of the data to inform the processing logicwhether the data may change during execution of the softwareapplication. In other embodiments that are not shown, there is not avolatile bit available but instead processing logic loads all dataoriginally into NVRAM memory space and whenever a portion of the data isre--written over, processing logic then determines that portion isvolatile and moves that element of data from an NVRAM allocated storagelocation to a DRAM allocated storage location.

Returning to block 306, if the volatile bit is not set, then processinglogic allocates a segment of NVRAM memory space for storage of theportion of the static data (processing block 308). Alternatively, if thevolatile bit is set, then processing logic allocates a segment of DRAMmemory space for storage of the volatile data (processing block 310).

In any event, once memory space is allocated (from any of processingblocks 304, 308, or 310), then processing logic updates the addressmapping table resident in DRAM with a pointer to the allocated memorysegment in one of the three allocated sections of memory (i.e., thevolatile data section, the static data section, or the code section).

Although application software is utilized as the example, thisparticular hybrid use of NVRAM and DRAM is not limited to applicationsoftware. The operating system and driver binaries may also be similarlydivided up and the static portions be placed in NVRAM while the volatileportions are placed in DRAM. This provides significant benefit in termsof reducing the amount of time required to resume from low power statesportions of the operating system and/or drivers would no longer need tobe copied back into DRAM when resuming from a hibernate-to-disk state.

4. Hybrid Memory Code and Data Storage for Embedded Controllers

Many computing platforms have a number of embedded processors that areused for a wide range of applications. An embedded processor can also bereferred to as a “microcontroller.” Examples of embedded processorsinclude manageability engines (which manage a computer system's securityand out-of-band communications) and graphics embedded controllers, amongothers. These embedded controllers use firmware during runtime asinstructions to execute many of their core functionalities. Thisfirmware is typically stored on the computer system's flash memory(e.g., BIOS) or mass storage drive (e.g., hard drive drive, solid statedrive, etc.). At system boot, the firmware is generally loaded into aninternal SRAM in the embedded controller for execution.

There are significant issues with storing the firmware externally on thesystem flash memory hard drive such as:

-   -   The firmware can be tampered with, so embedded microcontrollers        generally are required to perform an authentication at load time        to verify the integrity of the firmware before execution.    -   Firmware stored either in the system flash memory or hard drive        are subject to corruption, which can cripple the functionality        of the embedded microcontrollers.    -   Execution must be preceded by a copy of the firmware into the        local SRAM which is time consuming.

To address these limitations of embedded controllers, according to manyembodiments, a PCMS-based storage is packaged with the embeddedcontroller. The logic utilized to effectively replace system flashand/or a hard drive with a localized portion of PCMS memory wouldinclude:

-   -   A small amount of PCMS-based storage inside the component        accessible to the embedded controller.    -   A segment of the PCMS-based storage set aside for the firmware        code.    -   A segment of the PCMS-based storage set aside for storing        persistent data that an application might want to store across        boot sessions (e.g., digital rights management keys, offline        movie usage records, etc.)    -   A cryptographic verification module that allows the PCMS        firmware image to be overwritten only if the image is authentic.    -   An optional SRAM can be added to the embedded engine for runtime        data storage (e.g., local variables, a stack, etc.) if the PCMS        write latency adversely affects performance.    -   During boot of the computer system, the embedded controller may        execute the firmware code directly out of the internal PCMS        storage. This execute from PCMS directly is made possible        because PCMS read latency matches DRAM unlike system flash or a        hard drive.

FIG. 4 illustrates an embodiment of a computer system that includes anembedded controller which has direct access to a PCMS device internal toan I/O subsystem.

The computer system shown in FIG. 4 may be generally equivalent to thecomputer system described in FIG. 1, outside of changes in the I/Osubsystem 124. Thus, for detailed descriptions of other components,please refer to FIG. 1. Within the I/O subsystem 124, there is anembedded controller 400 and an internal PCMS memory 402 present in FIG.4. The internal PCMS 402 stores static information related to firmwarethat the embedded controller 400 executes. During boot, the embeddedcontroller 400 may directly execute firmware code from a read only (R-O)code region. 404 stored in the internal PCMS 402. Additionally, theinternal PCMS 402 may also store other information 406, such as fixed(i.e., static) data, keys for security applications (e.g., digitalrights management), as well as usage information to be saved for a latertime (e.g., a number of times a user watches a given movie on thecomputer system). This stored information 406 is data that is eitherentirely static or rarely updated and also may include data thatrequires non-volatility through power cycling the computer system.

Furthermore, in many embodiments, the embedded controller 400 has aninternal SRAM 408 portion of memory for storage of local variables, astack, and/or other information that dynamically changes throughout theexecution of the firmware. Thus, this information benefits from the fastwrite capabilities of the SRAM, as opposed to being limited to PCMSwrite speeds.

According to many embodiments, there is additionally a cryptographicverification (CV) hardware module 410 present in the I/O subsystem 124.The CV module is able to utilize hardware security verification keytechnology to require any updates/overwrites to any region of the storedfirmware on the PCMS to be authenticated through any implemented form ofsecurity verification (e.g., public and private authentication modulesusing keys).

FIG. 5 describes an embodiment of volatile and non-volatile memoryspaces in an I/O subsystem when utilizing a hybrid firmware code anddata storage system,

In FIG. 5, the embedded controller 400 has an internal SRAM memory space500 that stores runtime data storage 502. The embedded controller 400 isadditionally communicatively coupled to internal PCMS memory space 504,which stores a firmware code region 506 and other storage 508 thatstores fixed/static data, keys, and required non-volatile usageinformation.

In the following description, numerous specific details such as logicimplementations, means to specify operands, resourcepartitioning/sharing/duplication implementations, types andinterrelationships of system components, and logicpartitioning/integration choices are set forth in order to provide amore thorough understanding of the present invention. It will beappreciated, however, by one skilled in the art that the invention maybe practiced without such specific details. In other instances, controlstructures, gate level circuits and full software instruction sequenceshave not been shown in detail in order not to obscure the invention.Those of ordinary skill in the art, with the included descriptions, willbe able to implement appropriate functionality without undueexperimentation.

References in the specification to “one embodiment,” “an embodiment,”“an example embodiment,” etc., indicate that the embodiment describedmay include a particular feature, structure, or characteristic, butevery embodiment may not necessarily include the particular feature,structure, or characteristic. Moreover, such phrases are not necessarilyreferring to the same embodiment. Further, when a particular feature,structure, or characteristic is described in connection with anembodiment, it is submitted that it is within the knowledge of oneskilled in the art to effect such feature, structure, or characteristicin connection with other embodiments whether or not explicitlydescribed.

In the following description and claims, the terms “coupled” and“connected,” along with their derivatives, may be used. It should beunderstood that these terms are not intended as synonyms for each other.“Coupled” is used to indicate that two or more elements, which may ormay not he in direct physical or electrical contact with each other,co-operate or interact with each other. “Connected” is used to indicatethe establishment of communication between two or more elements that arecoupled with each other.

Embodiments of the invention may also be provided as a computer programproduct which may include a non-transitory machine-readable mediumhaving stored thereon instructions which may be used to program acomputer (or other electronic device) to perform a process. Thenon-transitory machine-readable medium may include, but is not limitedto, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks,ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, propagation mediaor other type of media/machine-readable medium suitable for storingelectronic instructions. Embodiments of the invention may also bedownloaded as a computer program product, wherein the program may betransferred from a remote computer (e.g., a server) to a requestingcomputer (e.g., a client) by way of data signals embodied in a carrierwave or other propagation medium via a communication link (e.g., a modemor network connection).

While the invention has been described in terms of several embodiments,those skilled in the art will recognize that the invention is notlimited to the embodiments described, can be practiced with modificationand alteration within the spirit and scope of the appended claims. Thedescription is thus to be regarded as illustrative instead of limiting.

1-14. (canceled)
 15. A system, comprising: a byte addressable volatilememory device to store volatile data segments of a software application,based on a first indication type of the volatile data segments beingsubject to read and write; and a byte addressable random accessnonvolatile memory device to store code segments and static datasegments of the software application, based on a second indication typeof the code segments and static data segments being subject to read andnot to write.
 16. The system of claim 15, wherein the first and secondindication types are to be set by a host operating system.
 17. Thesystem of claim 15, wherein the first and second indication types are tobe set by a driver.
 18. The system of claim 15, further comprising: amemory bus; wherein the volatile memory device and the nonvolatilememory device are coupled to the memory bus.
 19. The system of claim 18,wherein the memory bus comprises a double data rate memory bus.
 20. Thesystem of claim 15, wherein the nonvolatile memory device comprises aphase change memory device.
 21. The system of claim 15, wherein thephase change memory device comprises a phase change memory and switch(PCMS) device.
 22. The system of claim 15, wherein the nonvolatilememory device comprises a resistive random access memory device.
 23. Thesystem of claim 15, wherein the nonvolatile memory device comprises amagnetic-based random access memory device.
 24. The system of claim 15,wherein the nonvolatile memory device is to provide data access atcacheline granularity.
 25. The system of claim 15, further comprising: aprocessor; and a memory controller coupled to the processor, and coupledto the volatile memory device and the nonvolatile memory device.
 26. Thesystem of claim 25, wherein the processor is to execute a host operatingsystem including a memory management unit (MMU), the MMU to includeaddress mapping to the volatile memory device and to the nonvolatilememory device, wherein the MMU is to allocate application segments toselected address space based on whether an application segment is to beread and written, or only read.
 27. A method for memory management,comprising: loading an application into system memory, the applicationincluding volatile data segments having a first indication type of beingsubject to read and write, and code segments and static data segmentshaving a second indication type of being subject to read and not towrite; allocating address space in a byte addressable volatile memorydevice to store the volatile data segments based on the first indicationtype; and allocating address space in a byte addressable random accessnonvolatile memory device to store the code segments and the static datasegments based on the second indication type.
 28. The method of claim27, wherein allocating the address space in the volatile memory deviceand the nonvolatile memory device comprises allocating with anapplication programming interface (API).
 29. The method of claim 28,wherein allocating with the API comprises allocating through either ahost operating system or a driver calling the API.
 30. The method ofclaim 27, wherein the volatile memory device and the nonvolatile memorydevice are coupled to a common system memory bus.
 31. The method ofclaim 27, wherein the volatile memory device comprises a memory devicecompatible with a double data rate standard, and wherein the nonvolatilememory device comprises: a phase change memory device, a phase changememory and switch (PCMS) device, a resistive random access memorydevice, or a magnetic-based random access memory device, or acombination of these.
 32. The method of claim 27, wherein allocating theaddress space in the volatile memory device and the nonvolatile memorydevice comprises allocating with a memory management unit (MMU).
 33. Themethod of claim 32, further comprising storing file location informationin a translation lookaside buffer (TLB) to identify address space in thevolatile memory device for the volatile data segments, and in thenonvolatile memory device for the code segments and the static datasegments.
 34. A computer readable storage medium having content storedthereon, which when accessed by a computer device causes the computerdevice to perform operations including: loading an application intosystem memory, the application including volatile data segments having afirst indication type of being subject to read and write, and codesegments and static data segments having a second indication type ofbeing subject to read and not to write; allocating address space in abyte addressable volatile memory device to store the volatile datasegments based on the first indication type; and allocating addressspace in a byte addressable random access nonvolatile memory device tostore the code segments and the static data segments based on the secondindication type.
 35. The computer readable storage medium of claim 34,wherein the content to cause allocating the address space in thevolatile memory device and the nonvolatile memory device comprisescontent to cause allocating with an application programming interface(API).
 36. The computer readable storage medium of claim 34, wherein thevolatile memory device and the nonvolatile memory device are coupled toa common system memory bus.
 37. The computer readable storage medium ofclaim 34, wherein the volatile memory device comprises a memory devicecompatible with a double data rate standard, and wherein the nonvolatilememory device comprises: a phase change memory device, a phase changememory and switch (PCMS) device, a resistive random access memorydevice, or a magnetic-based random access memory device, or acombination of these.
 38. The computer readable storage medium of claim34, wherein the content to cause allocating the address space in thevolatile memory device and the nonvolatile memory device comprisescontent to cause allocating with a memory management unit (MMU).
 39. Thecomputer readable storage medium of claim 38, further comprising contentto cause storing file location information in a translation lookasidebuffer (TLB) to identify address space in the volatile memory device forthe volatile data segments, and in the nonvolatile memory device for thecode segments and the static data segments.